Journal of the College of Physicians and Surgeons Pakistan
ISSN: 1022-386X (PRINT)
ISSN: 1681-7168 (ONLINE)
Affiliations
doi: 10.29271/jcpsp.2025.11.1385ABSTRACT
Objective: To assess the accuracy and clinical applicability of YOLO-based segmentation models for detecting Schistosoma eggs in urinary bladder histopathology slide images, focusing on both bounding box and mask segmentation tasks.
Study Design: A descriptive, cross-sectional study.
Place and Duration of the Study: Artificial Intelligence Technology Centre, National Centre for Physics, Islamabad, Pakistan, from September to November 2024.
Methodology: A high-quality dataset was compiled using histopathological slides obtained from real patient samples available on the open-source platform PathPresenter. All images were meticulously annotated by expert histopathologists. The dataset included 681 images containing 2,751 schistosomes, divided into 476 training images (1,932 schistosomes), 136 validation images (539 schistosomes), and 69 testing images (280 schistosomes). Data pre-processing techniques were applied to optimise the quality of training and evaluation datasets. Multiple YOLO-based segmentation models, such as YOLOv5, YOLOv8, YOLOv9, and YOLOv11 variants (n/s/m/l/x/c/e), were trained and evaluated for both bounding box and mask detection. Model performance was evaluated using precision, recall, F1 score, mean Average Precision at 50% Intersection over Union (IoU; mAP50), and mAP across 50 to 95% IoU (mAP50-95) for both bounding box and mask segmentation tasks.
Results: Among the models, YOLOv8l demonstrated the highest diagnostic accuracy, achieving an F1 score of 95.09 and a mAP50 of 96.8 for bounding box detection. For mask detection, it attained an F1 score of 94.19 and an mAP50 of 96.2. YOLOv5m and YOLOv5x also performed well, balancing accuracy with computational efficiency. Smaller models exhibited limitations in sensitivity and precision.
Conclusion: YOLO-based segmentation models exhibit strong potential for automated detection of schistosomiasis in urinary bladder histopathology images. However, future large-scale validation studies on bigger datasets are required for further confirmation.
Key Words: Deep learning, Mask segmentation, Medical image analysis, Schistosomiasis, YOLO segmentation.
INTRODUCTION
Schistosomiasis is a chronic tropical parasitic disease caused by Schistosoma trematode blood flukes, affecting over 700 million people across 78 countries, primarily in low- and middle-income countries.1,2 Conventional microscopy remains the standard for detecting S. haematobium infection through egg identification and quantification in stool/urine samples.3
However, this method is labour-intensive, expertise-dependent, and requires laboratory infrastructure, limiting its use in resource-poor areas. Advanced techniques such as real-time PCR and lateral flow tests offer higher sensitivity or specificity but face similar infrastructure constraints.3-5
Schistosomiasis presents in urinary bladder biopsies as charac- teristic oval eggs with terminal spines, typically surrounded by granulomatous inflammation, fibrosis, or calcification.3 Chronic infection may result in complications such as bladder fibrosis, hydronephrosis, and increased squamous cell carcinoma. Early identification enables timely treatment and helps prevent severe long-term complications.3,6 Automated digital micro- scopes integrated with AI algorithms offer customisable tools for schistosomiasis diagnosis.6-8 Recent studies have validated their diagnostic accuracy for S. haematobium, reporting sensitivity ranging from 32 to 91% compared to conventional microscopy.9 A two-stage deep learning framework achieved 93.75% sensitivity, 93.94% specificity, and 93.75% precision.10
Traditional diagnostic methods in resource-limited areas are slow and error-prone. This study used YOLO-based segmentation models to precisely outline egg locations. By leveraging pathologist-annotated data, the research aimed to provide a fast, accurate, and practical solution to assist pathologists, reduce their workload, and enable timely and consistent diagnoses.
METHODOLOGY
This study was conducted at the Artificial Intelligence Technology Centre, National Centre for Physics, Islamabad, Pakistan, from September to November 2024. Ethical approval was exempted (Ref No. 2025-7717-23671), and the study was not funded. A systematic approach was used to detect schistosomiasis using YOLO-based segmentation models, which is a deep learning-based object detection framework that detects and classifies objects in images at high speed by running images in one pass and dividing them into gridded sections to make predictions of bounding boxes and class probabilities simultaneously. It is highly accurate and efficient in deep learning-based helminth egg detection.10,11
Schistosomes have a complex and variable morphology, including irregular shapes and clustering patterns in tissue sections. Therefore, segmentation models that divide an image into semantically meaningful parts offer greater precision than standard object detection methods, making them more suitable for high-throughput histopathological analysis. Detection models are faster, but due to overlapping parasites and much larger computing requirements, these models have become less useful.
For dataset preparation, histopathological images obtained from PathPresenter, an open-source pathology database, with resolutions of 1200 x 1600, 1600 x 1200, and 1860 x 890 were included in the study. The manual annotation of Schistosoma eggs was performed by two histopathologists from United Medical and Dental College (UMDC) and Jinnah Sindh Medical University (JSMU) to ensure accuracy and clinical relevance. Annotation refinement was carried out using LabelMe. To ensure diagnostic reliability, ambiguous images that could not be confidently annotated, those of poor diagnostic quality, and those not meeting the required resolution standards for annotation were excluded from the dataset. Data pre-processing techniques included normalisation, sharpening, resizing, and histogram equalisation to enhance image quality for AI training. Data augmentation techniques—such as flipping, rotation, Mix-Up, Copy-Paste, and mosaic transformations—were applied to 40% of the training set to improve model generalisation. The dataset was divided into training (70%), validation (20%), and testing (10%) subsets. Table I(A) shows the data distribution, confirming the integrity and representativeness of these splits.
The YOLO versions used in the study were YOLOv5, YOLOv8, YOLOv9, and YOLOv11. Hyperparameters were optimised to enhance segmentation accuracy and computational efficiency. The important hyperparameters are shown in Table I(B).
Training of all the models was conducted on an NVIDIA RTX 4060 GPU (8GB VRAM) with an Intel Core i9-14900KF processor and 64 GB RAM by the author, Aqsa Abu Bakar, from AITec. This hardware configuration ensured the smooth handling of the huge computational demands posed by this big dataset and resource-intensive training processes.
To evaluate the accuracy and effectiveness of the YOLO models for Schistosoma egg detection, several statistical metrics were applied: precision, which measures the proportion of correctly identified positive detections; recall, which assesses the proportion of actual positives correctly detected; and the F1 score, which represents the harmonic mean of precision and recall, providing a measure of balanced performance. Additionally, mean Average Precision (mAP) at 50% Intersection over Union (IoU; mAP50) and mAP across 50 to 95% IoU (mAP50-95) were used to quantify detection quality across varying overlap thresholds. These metrics were calculated for both bounding box and mask segmentation tasks to comprehensively assess model performance. Confusion metrics were also calculated to support performance evaluation. True positives (TP) were regions correctly identified as eggs, true negatives (TN) were correctly identified non-egg regions, false positives (FP) were regions wrongly classified as eggs, and false negatives (FN) were actual eggs missed by the model. These formed the basis for computing precision, recall, F1-score, and related metrics. The model performance was assessed using standard evaluation metrics for object detection and segmentation tasks. Precision was calculated as the ratio of true positive detections to the total number of positive predictions (TP / (TP + FP)), while recall measured the proportion of actual positive cases correctly identified (TP / (TP + FN)). The F1 score provided a balanced measure of precision and recall, calculated as 2 × (Precision × Recall) / (Precision + Recall). Additionally, the mAP was computed across IoU thresholds from 0.5 to 0.95, calculated as (1 / 10) × Σ (AP at IoU from 0.5 to 0.95), providing a comprehensive assessment of model performance across different detection confidence levels.
Table I: (A) Data split (B) Hyperparameter used during the training.
|
Dataset split |
Images |
Schistosomes |
|
Training |
476 |
1,932 |
|
Validation |
136 |
539 |
|
Testing |
69 |
280 |
|
Total |
681 |
2,751 |
|
Hyperparameters |
Values |
|
|
Task |
Segment |
|
|
Epochs |
100 |
|
|
Batch |
8 (n, s, m, c) / 4 (l, x, e) |
|
|
Workers |
8 |
|
|
Optimiser |
AdamW |
|
|
Seed |
0 |
|
|
Deterministic |
True |
|
|
Single_cls |
False |
|
|
Learning rate (Lr) |
0.002 |
|
|
Momentum |
0.9 |
|
|
Weight_decay |
0.0005 |
|
|
Warmup_epochs |
3 |
|
|
Warmup_momentum |
0.8 |
|
|
Warmup_bias_lr |
0.1 |
|
|
Mask_ratio |
4 |
|
Table II: Evaluation metrics of various YOLO models.
|
Models |
Box (P) |
Box (R) |
Box (F1 score) |
Box (mAP50) |
Box (mAP50-95) |
Mask (P) |
Mask (R) |
Mask (F1 score) |
Mask (mAP50) |
Mask (mAP50-95) |
|
YOLOv5n |
92.7 |
86 |
89.22 |
92.4 |
72 |
92.7 |
86 |
89.22 |
92.6 |
68 |
|
YOLOv5s |
91.7 |
88.1 |
89.86 |
93.5 |
75.2 |
92.3 |
87 |
89.57 |
92.7 |
71.8 |
|
YOLOv5m |
91.2 |
93.2 |
92.19 |
96.9 |
80.2 |
91.2 |
93.2 |
92.19 |
96.9 |
75.7 |
|
YOLOv5l |
87.7 |
90 |
88.84 |
95.1 |
79.5 |
87.7 |
90 |
88.84 |
94.1 |
75.7 |
|
YOLOv5x |
93.8 |
90.4 |
92.07 |
95.7 |
77.4 |
93.2 |
92 |
92.6 |
96.6 |
75.2 |
|
YOLOv8n |
91.6 |
87.3 |
89.4 |
94.8 |
74.9 |
89.5 |
85.4 |
87.4 |
93.2 |
70 |
|
YOLOv8s |
90.5 |
86 |
88.19 |
93 |
76.3 |
90.5 |
86 |
88.19 |
93.1 |
72.6 |
|
YOLOv8m |
92.4 |
90 |
91.18 |
93.2 |
78.1 |
89.7 |
93 |
91.32 |
93.9 |
72.7 |
|
YOLOv8l |
94.2 |
96 |
95.09 |
96.8 |
81.1 |
93.4 |
95 |
94.19 |
96.2 |
77 |
|
YOLOv8x |
93.3 |
85 |
88.96 |
92.9 |
76.8 |
93.3 |
85 |
88.96 |
93.8 |
73 |
|
YOLOv9c |
81 |
91 |
85.71 |
92.7 |
73.7 |
81 |
91 |
85.71 |
92.5 |
69.7 |
|
YOLOv9e |
91.3 |
88 |
89.62 |
94.3 |
74.4 |
91.3 |
88 |
89.62 |
94.1 |
70.1 |
|
YOLOv11n |
84.2 |
90.5 |
87.24 |
94 |
72.5 |
83.3 |
89.5 |
86.29 |
93.2 |
68.4 |
|
YOLOv11s |
92 |
90 |
90.99 |
92.5 |
76 |
92 |
90 |
90.99 |
92.5 |
70.8 |
|
YOLOv11m |
91.2 |
93 |
92.09 |
95.6 |
77.4 |
90.8 |
92 |
91.4 |
94.9 |
74 |
|
YOLOv11l |
89.2 |
91.2 |
90.19 |
95.2 |
77.7 |
89.2 |
91.2 |
90.19 |
94.8 |
72.6 |
|
YOLOv11x |
86.4 |
92 |
89.11 |
92.7 |
75.1 |
86.4 |
92 |
89.11 |
93.1 |
69.7 |
|
P: Precision; R: Recall; mAP: Mean average precision. |
||||||||||
Figure 1: (A) F1 score of different variants of YOLO. (B) mAP@50 of different variants of YOLO. (C) mAP@50-95 of different variants of YOLO on the schistosomiasis dataset.
Model selection was determined based on validation scores derived from the unique validation set (136 images, 539 schistosomes), where models were ranked by F1 score and mAP50 to identify the best performer for Schistosoma eggs detection. The differences were calculated based on confusion metrics (TP, TN, FP, and FN) from the validation set, with the final evaluation confirmed on the testing set (69 images, 280 schistosomes). Statistical analysis was performed using Python-based tools, specifically the SciPy library and NumPy for metric calculations, ensuring the robust and reproducible evaluation of model effectiveness.
RESULTS
Training time varied across YOLO versions. YOLOv5n trained the fastest (25.38 minutes), while YOLOv5x took the longest (63.96 minutes). YOLOv8n was the most efficient (8.58 minutes), and YOLOv8x took 71.52 minutes, whereas YOLOv9e required the longest training time (81.54 minutes). For the YOLOv11 series, YOLOv11n took 9.18 minutes, and YOLOv11x finished training in 69.96 minutes. The smaller models trained faster due to fewer parameters, and the complex models required extended training periods, highlighting the evolution of YOLO architecture and demonstrating improvements in processing efficiency while maintaining the model complexity.
The result of the segmentation task in terms of performance of various YOLO models for the Schistosoma egg detection is shown in Table II. The performances within each category were compared in both bounding box and mask metrics with respect to precision, recall, and F1 score (Figure 1A), mAP@50 (Figure 1B), and mAP@50-95 (Figure 1C). Various YOLO model versions and their differences about scale and higher-performing architectures have been compared (Figure 1C). The model works perfectly with a few FP, which can further be reduced by tightening the threshold.
Losses for bounding box, segmentation, classification, and distribution focal loss consistently declined, as shown in Figure 2A. Validation exhibited a similar trend, confirming effective generalisation to the new data without overfitting and ensuring reliable performance on new data. Precision, recall, and mAP demonstrated a successive increase per epoch. The final mAP@0.5 and mAP@0.5-0.95 values indicated excellent detection performance. This progression showed robust feature learning and effective performance in both detection and classification tasks. YOLOv8l achieved a high mAP@0.5 of 0.922 for detecting Schistosoma eggs (Figure 2B), demonstrating excellent diagnostic accuracy.
Figure 2: (A) Training results for YOLOv8l. (B) Precision-recall curve for YOLOv8l training.
Figure 3: (A) Bounding box detection. (B) Mask segmentation task. (C) Predictions of YOLOv8l on unseen data.
The precision-recall curve confirmed a strong balance between sensitivity and specificity, minimising FP while maintaining high recall. For bounding box detection (Figure 3A), larger models outperformed smaller ones, with YOLOv8l leading at an F1 score of 95.09 and mAP50 of 96.8 (Figures 1A, B). YOLOv5m also performed well (F1 score: 92.19, mAP50-95: 80.2, Figure 1C). In contrast, YOLOv9c and YOLOv11n, due to their smaller size, had lower F1 scores of 85.71 and 87.24, respectively (Figure 1A).
Mask detection, a more complex task (Figure 3B), also favoured YOLOv8l (F1: 94.19, mAP50: 96.2, Figures 1A, B). It is the identification and localisation of specific regions in an image, highlighting regions of interest such as parasites, cells, or abnormalities. YOLOv5m followed closely (F1: 92.19, mAP50: 96.9). Larger models, such as YOLOv8x and YOLOv8m, maintained strong performance (F1: 92.6 and 91.32, respectively). However, smaller models (YOLOv11n and YOLOv9c) showed reduced accuracy (Figure 1A) likely due to fewer parameters, which restrict their ability to learn complex features necessary for future learning. When YOLOv8l was tested on unseen histopathology slides (Figure 3C), it maintained robust performance, highlighting its potential for the real-world histopathological applications.
DISCUSSION
Schistosomiasis remains a major parasitic disease, particularly in endemic regions where conventional microscopy is labour-intensive and resource-dependent.1,3 AI-based tools offer an alternative diagnostic approach, improving both efficiency and accuracy.12,13 This study evaluated YOLO-based segmentation models for detecting Schistosoma eggs in histopathological images, with YOLOv8l demonstrating the highest accuracy.
This study used an open-source dataset with expert histopathologist annotations. YOLOv8l achieved an F1 score of 95.09 and mAP@50 of 96.8 in bounding box detection, outperforming earlier models. Previous studies using YOLOv5m reported an F1 score of 92.4 and mAP@50 of 90.8,12 while faster regional-based convolutional neural network (R-CNN) achieved a mAP@50 of 88.7.13 The superior feature extraction and recall accuracy of YOLOv8l contributed to its improved performance.
For mask segmentation, YOLOv8l achieved an F1 score of 94.19 and mAP@50 of 96.2, surpassing YOLOv5 (F1 of 89.5) in helminth segmentation. The smooth decline in training and validation loss indicated effective optimisation and generalisation on unseen data, confirming its clinical reliability, similar to the trends reported in other studies on parasitic egg-identification using a convolution and attention network.14,15
Previous studies have shown the potential of computational techniques on digital images.16,17 Colley et al. highlighted that the ability to diagnose schistosomiasis rapidly and accurately is crucial in endemic regions, where delayed treatment often results in severe morbidity.18 Another study supported the need for advanced, automated tools, such as AI, to complement traditional diagnostics, which can be slow and resource-intensive in endemic areas.19 The CNN and deep learning models, such as YOLO, can offer a faster and reliable solution by automating the detection of parasitic eggs in histopathological images, thus significantly improv-ing the diagnostic speed and accuracy.10,20-22 Wang et al. highlighted that YOLOv5 could improve in accuracy and robustness when combined with other AI techniques.23 The findings of this study establish YOLOv8l as a highly accurate model for Schistosoma egg detection in histopathological images. Its improved performance over previous models suggests potential integration into histopathology workflows. The combination of detection and segmentation in AI-based diagnostic models presents a significant leap forward in the diagnosis of parasitic diseases, as highlighted by another study.24 Future research should focus on real-time optimisation and clinical deployment of YOLOv8 models for parasitic disease diagnostics. Larger-scale studies are recommended to validate these findings further.
CONCLUSION
YOLOv8l outperformed other YOLO models in Schistosoma egg detection, excelling in both bounding box and mask segmentation tasks, making it the best option when precision and recall are crucial. YOLOv5m and YOLOv5x provided a good balance of accuracy and efficiency, making them suitable for assisting histopathologists. Smaller models showed limitations, reinforcing the advancements in the architecture of YOLO. The model selection should, therefore, be based on clinical requirements and computational constraints, supporting AI-assisted diagnostics in histopathology.
ETHICAL APPROVAL:
Ethical approval was exempted (Ref No. 2025-7717-23671) as this study utilised open-source data from PathPresenter, an open-source pathology database. No human subjects were directly involved in data collection, and all images were anonymised and publicly available.
COMPETING INTEREST:
The authors declared no conflict of interest.
AUTHORS’ CONTRIBUTION:
MS: Conceived the research idea and designed the study.
AAB: Collected data, implemented computational tools, wrote the manuscript, and generated results.
TZ: Critically analysed the manuscript and assisted in data collection.
SB: Provided histopathological annotation expertise.
SP, AJ: Contributed to manuscript writing and critical review.
All authors approved the final version of the manuscript to be published.
REFERENCES